WeRateDogs is a Twitter account that rates people's dogs with a humorous comment about the dog. These ratings almost always have a denominator of 10. The numerators, though? Almost always greater than 10. 11/10, 12/10, 13/10, etc. Why? Because "they're good dogs Brent." WeRateDogs has over 4 million followers and has received international media coverage. A series of data were gathered about some of the dogs on WeRateDogs page. Some of the information contained in the data are:
These information were analyzed, and I’ll be sharing with you, some interesting outcomes of my analysis.
Do you think, there is any correlation between favorite_count, retweet_count and rating numerator? I would have expected that any dog with a high rating should have a high retweet and/or favorite count. This however, was not the case. Below is a graphical representation between these variables:
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import plotly.express as px
df = pd.read_csv('twitter_archive_master.csv')
#getting the correlation between retweet_count and favorite_count
fig = px.scatter(df, x="retweet_count", y="favorite_count", color="tweet_id",trendline="ols")
fig.show()
There is a strong correlation between these two values (retweet_count and favorite_count) with correlation coefficient of 0.86
Now, here's that of rating numerator against retweet_count
#getting the correlation between retweet_count and rating_numerator
fig = px.scatter(df, x="rating_numerator", y="retweet_count", color="tweet_id",trendline="ols")
fig.show()
There is a very weak correlation between rating_numerator and retweet_count with correlation coefficient of 0.08. A weak correlation can also be implied between rating numerator and favorite count since favorite count and retweet count are strongly correlated
A further review was done to check the dog with the highest and lowest favorite count.
The dog with the highest favorite count of 156,628 happened to be a Labrador_retriever, also with a high numerator of 13 against rating denominator of 10
While the dog with the lowest favorite count of 72 is an English_setter. On the contrary, this dog has a high numerator raing of 11 against denominator of 10. This further supports that there is a weak correlation between rating_numerator and favorite count.
The highest numerator rating in the data set is 14, and here's a list of the dog breeds on 14 list!:
Pembroke, Samoyed, French_bulldog, Chihuahua, black-and-tan_coonhound, bloodhound, golden_retriever, Bedlington_terrier, Rottweiler, Pomeranian, Irish_setter, Gordon_setter, standard_poodle, French_bulldog, golden_retriever, Eskimo_dog
That'll be all on the sumamry of my analysis